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I Abstract 

' We present quantitative analysis of various (syntactic and behavioral) 

0^ ■ properties of random A-terms. Our main results are that asymptotically all 

the terms are strongly normalizing and that any fixed closed term almost never 

O' \ appears in a random term. Surprisingly, in combinatory logic (the translation 

of the A-calculus into combinators) the result is exactly opposite. We show 
I— I ' that almost all terms are not strongly normalizing. This due to the fact that 

(-H ^ any fixed combinator almost always appears in a random combinator. 
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^ 1 Introduction 

> . . . 

' Since the pioneering works of Church, Turing et al., more than 70 years ago, a wide 

, range of computational models have been introduced. It turns out that they are 

' all equivalent in what they can compute. However, this equivalence says nothing 

, about what do typical programs or machines of each of these models. 

CO ' This paper addresses the following question. Having a (theoretical) program- 

, ming language and a property, what is the probability that a random program 

satisfies the given property? In particular, is it true that almost every random 
program satisfies the desired property. 

We concentrate on functional programming languages and, more specifically, on 
the A-calculus, the simplest such language (see [11|TT1[T3] for similar works on other 
• models of computation). The only work that we have found on this subject is some 

5^ , experiments made by Jue Wang (see [16, ) . Most interesting properties of terms are 

those concerning their behavior. However, to analyze them, one has to consider 
some syntactic properties as well. 

As far as we know, no asymptotic value for the number of A-terms of size n is 
known. We give (see SectionE]) upper and lower bounds for this (super-exponential) 
number. Although the gap between the lower and the upper bound is big (expo- 
nential), these estimations are sufficient for our purpose. 
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We prove several results on the structural form of a A-term. In particular, we 
show that almost every closed A-term begins with "many" A's (the precise meaning 
is given in Theorem II 4p . Moreover, each of them binds "many" occurrences of 
variables (Theorems [151 [H] and [T7)) . Finally, given any fixed closed A-term, almost 
no A-term has this term as a sub-term (Theorem [7T]) . 

We also give a result on the behavior of terms, our original motivation. We 
show that a random term is strongly normalizing {SN for short) with probability 
1. Remember, that, in general, being SN is an undecidable question. 

Combinatory logic is another programming language related to the A-calculus. 
It can be seen as an encoding of A-calculus into a language without variable binding. 
Moreover, there are translations, in both directions, which, for example, preserve 
the property of being SN. Surprisingly, our results concerning random combinators 
are very different from those for the A-calculus. For example we show that, for 
every fixed term to, almost every term has to as sub-term and this, of course, 
implies that almost every term is not SN. The different of results concerning 
strong normalization between A-calculus and combinatory logic might come from 
the large increase of size induced by the coding of bound variables in combinatory 
logic. This is discussed in Section [H 

Our interest in statistical properties of computational objects like lambda terms 
or combinators is a natural extension on similar research on logical objects like for- 
mulas or proofs. This paper is a continuation of the research in which we try to 
estimate the properties of random formulas in various logics. Especially the proba- 
bility of truth (or satisfiability) for random formulas. For the purely implicational 
logic with one variable, (and at the same time simple type systems) the exact value 
of the density of true formulas have been computed in the paper of Moczurad, 
Tyszkiewicz and Zaionc jl3) and |15j . Quantitative relationship between intuition- 
istic and classical logics (based on the same language) has also been analyzed. The 
exact value describing how big fragment of the classical logic with one variable is 
intuitionistic has been determined in Kostrzycka and Zaionc [lOj . For the results 
with more then one variable, and other logical connectives consult [1],[H],[7]. 

The case of and/or connectors received much attention - see Lefmann and Sav- 
icky JJj, Chauvin, Flajolet, Gardy and Gittenberger [2] and Gardy and Woods [B]. 
We refer to Gardy [5] for a survey on probability distribution on Boolean functions 
induced by random Boolean expressions. 

2 Organization of the paper 

In Sections [3] and [4] we recall basic definitions of the A-calculus and introduce the 
notation used within the paper. Section [5] summarizes the basic combinatorial facts 
that are useful in our development. Starting from Section [S] we present our results 
for lambda calculus. 

Section [S] contains results in combinatory logic, namely that every fixed term 
appears in almost every term. Section [5] discusses the question of size, gives ex- 
perimental results for questions for which we have no answers. It also gives open 
questions and proposes future direction of research. 

In the whole paper we do not aim at providing the best possible estimations for 
the analyzed sequences. Most of the quantitative results can be easily improved. We 
present estimations which are sufficient for our structural results, without sacrifying 
the simplicity of proofs for better accuracy. 
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3 Generality on the A-calculus 

Definition 1. The set of X-terms (or simply terms) is defined by the following 
grammar (where V is a countable set of variables) 

t,u:^V I XV.t I {t u) 
We denote by A the set of all closed X-terms. 

As usual, A-terms are considered modulo the a-equivalence i.e. two terms which 
differ only by the names of bound variables are considered equal. Note that A-terms 
can be seen as trees. For every term t, if we forget about variable binding wc obtain 
a unary-binary tree. We call it the structure of t. Removing from the structure the 
unary nodes and connecting binary ones and leaves so that to preserve the original 
connectivity, we obtain binary structure of t. 

We often use (without giving the precise definition) the classical terminology 
about trees (e.g. path, root, leaf, etc.). The paths from from the root to leafs are 
called branches. 

Definition 2. 1. t' is a sub-term oft (denoted as t' <t) if 

• t = t', 

• or t = Xx.u and t' < u, 

• or t = {u v) and {f <u or t' < v). 

2. Let t be a term, and u = Ax. a be a sub-term of t. We say that Xx is binding 

if X has a free occurrence in a. 

3. The unary height of a term t is the maximal number of X's on a path from the 
root to some leaf of t. 

4- Let t be X-term. 

• Two X's in t are called incomparable if there is no branch containing both 
of them. 

• The X-width of t is the maximal number of pairwise incomparable bind- 
ing X 's. 

5. We say that a term has k head lambdas if its structure starts with k unary 
nodes followed by a binary node or a leaf. 

Definition 3. The size of a term t (denoted as size{t)) is defined by the following 
rules. 

- size{x) = if X is a variable, 

- size{Xx.t) = size{t) + 1, 

- size{{t u)) = size{i) + size(u) + 1. 

Definition 4. Let n be an integer. We denote by A„ the set of closed terms of size 
n. Obviously the set A„ is finite. We denote by Ln its cardinality. 

As far as we know, no asymptotic analysis of the sequence L„ has been made. 
Moreover, typical combinatorial techniques does not seem to apply easily for this 
task. 
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4 Main notations 



To attribute a precise meaning to sentences like "asymptotically all lambda terms 
have property P" we use the following definition of asymptotic density. For any 
finite set A we denote by the number of its elements. 

Definition 5. For a set of lambda terms A, we denote by d{A) the following limit 
(if it exists): 

«^oo #(A„) 

If the limit exists it is called asymptotic density of A. 

Note that d is not a measure (e.g. it is not countably additive). 

Let P be a property of lambda terms. If d{{t G A | P(t) holds}) — a, we say 
that the density of terms satisfying P is a. By an analogy to researches on graphs 
and trees, we abbreviate the sentence like "the density of terms satisfying P is 1" 
by " random term satisfies P" . 
The idea of proofs 

In Section [6l a density is proved by computing bounds for the cardinalities 
of the sets we consider, and showing that the quotient tends to zero. The compu- 
tations are quite standard. The computations have been checked by Maple. The 
corresponding file, together with a pdf of it, can be found at the URL: 
www . lama, univ-savoie . fr/~ david/ftp /limit 

In Section[7]we show that a set A of terms has density by defining an injective, 
size preserving function ip from A into A (we call such functions codings). Then we 
show that the image of (p has density 0. This is done either by using the fact that 
it is included in a set, which is already known to have density 0, or by computing 
an upper bound for the cardinality of this image. 

The proofs concerning densities in the calculus of combinators are based on 
analysis of generating functions enumerating considered sets of combinators. 

Note about the statement of the theorems 

1. Many of the following sub-sections use results of the previous ones. When, in 
some section, we say "let t be a typical term" , this implicitly mean that we 
restrict ourselves to terms having the properties for which we have seen, in 
the previous sub-sections, that it has density 1. We also assume that its size 
is big enough. 

2. The statement of the theorems sometimes requires to give a name to the size 
of terms. This size is always denoted by n. Thus a statement "the density of 
terms satisfying P{t, n) is a" means that 

,. #({^£ A„ I Pit,n)}) 

n^oo #(A„) 

5 Combinatorial results 
5.1 Catalan numbers 

We denote by C(n) the Catalan numbers i.e. the number of binary trees with n 
inner nodes. We use the following proposition. 

Proposition 6. C(n) ~ ^^j.-^^ and thus, for large enough n, we have C{n) > 
C^^j2 for some constant C > 0. 

Proof. This is a classical result. See for example [3]. □ 
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5.2 Large Schroder numbers 



We denote by M{n, k) the number of unary-binary trees with n inner nodes and k 
leaves. Let M{n) — X]fe>i M{n, k) denote the number of unary-binary trees with n 
inner nodes. These numbers are known as the large Schroder numbers. Note that, 
since in this paper the size of variable is 0, we use them instead of the so-called 
Motzkin numbers, which enumerate unary-binary trees with n total nodes. We use 
the following proposition. 

Proposition 7. 

1. Af(n,fc)^c(fc-i)(::+t;D- 

Proof. (1) Every unary-binary tree with n inner nodes and k leaves has k — 1 
binary and n — k + 1 unary nodes. We have C{k — 1) binary trees with fc, leaves. 
Every such a tree has 2 ■ k — 1 nodes (inner nodes and leaves). Therefore there are 
(^^fc+i) possibilities of inserting n — k + 1 unary nodes (we can put unary node 
above every node of a binary tree). (2) The asymptotic for M{n) is obtained by 
using standard tools of the generating function (for this sequence it is equal to 
m{x) = i^^-Vi~6^+^ ly For more details see [3]. □ 



6 Proofs using calculus 
6.1 Lower Bound for L„ 

The asymptotic inequality f{n) > g{n) means that there exists h such that h{n) 
g{n) and f{n) > h{n). 

Theorem 8. For any e > Q we have 

'(4 - e)n 



In(ri) 



Proof. Let LB{n, k) be the number of A-terms of size n with k head A's and no other 
A below. Since the lower part of the term is a binary tree with n — k inner nodes, 
and each leaf can be bound by k lambdas, we have LB{n, k) — C{n — fc)fc"^'^+^. 
Clearly LB{n, k) < Ln. We choose k = Th^TTjl- Then we get: 

> Cin - r-^l) . (^)"--fe- > (> • " '"-^ ' 



In(ri) ln(n) ^ ln{n) p{n) ' 

for some positive polynomial p (the last asymptotic ineqality is a consequence of 
Proposition [H]). It is easy to see that the last formula is asymptotically greater then 

/(4-e)n\""i^ 

□ 

6.2 The number of A's in a term 
Theorem 9. 

1. The density of terms having more than -jj^y A's is 0. 

2. The density of terms having less than ^ ^^^^^ A 's is 0. 
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Proof. (1) Let S{n, k) be the number of terms of size n containing more than j^^y 
A's. We have S{n, k) < X^p>_^ UB{n,p) where UB{n,p) = M{n, n - p + 1) ■ 

pu-p+i ^ii\g jg because a term with p lambdas is a unary binary tree whose 
n — p+1 leaves can be bound by, at most, p lambdas each. For A: > 1 the function 
pn-p+i jg decreasing for p > j^^- Thus, for every fc > 1, we have 



kn 



, , f kn Y+^-m^ 



\ln(n) ^ 

Using our lower bound for X„, we find < (4 - £)) where 

n+l- 



$(n, q) = ^ 



kn 



\\n{n) ) 



To get the result it remains to show that, for A; = 3 and any £ > 0, 4 — e) 
tends to 0. ^ 

Using that M(n) ( 3,2^^ ) ^® have, for n large enough, (we introduce 
an extra constant C > 1 to compensate for the equivalent) 



n+l- 



3 

n2 



(ln(n)) 



We get a simpler upper bound by using the to compensate for the +1 expo- 
nent: 



^{n, q) < — 



( qn y-^' 
\ln{n)) 

-kn 

kn \ / qn 



{3-2V2)J \Hn) J \\n{n) 

Remarking that {^)'^ = e''^" and (^) ^ = e" (4^) ^ 

we have: 



4>(n, q) < 



ke^-^ \" / qk-^ \ 



This means that g) converges toward zero if ^^^^'^ ^^^ < 1. Since, ke 

reaches its maximum 1 in fc = 1 and < (7(3 — 2\/2) < 4(3 — 2\/2) < 1 (recall 
that we will use g = 4 — e with e > 0), the equation ke^~'^ = g(3 — 2-\/2) has two 
solutions, one for A; > 1 the other for fc < 1. It is easy to see that the first solution 
is smaller than 3 because 3e^~^ < 3^ < 4(3 — 2^/2) and e = 4 — g can be chosen 
small enough. 

(2) The proof of the second part of the theorem is analogous. The computation 
is essentially the same with fc < 1. It is easy to check that the solution (less than 
1) of the equation fce^"*' = q{2> — 2-\/2) is less than i. □ 
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Remark 10. The theorem above shows that the typical proportion of unary nodes 
over binary nodes in lambda terms is far from the typical proportion in ordinary 
unary-binary trees which tends a positive constant. 



6.3 Upper Bound for L„ 
Theorem 11. For all e > we have 



Ln < 



(12 + e)n\" 



ln(n) 

3n 
i(n) 

A's. Note that, accordmg to Theorem [5] we have L„ ^ iV„. We have 



Proof. Let Nn be the number of terms of size n with less than and more than 



3 ln(n) 

Nn < CXn + l)Cg}) (i^)"^""^ 

where 

• C{n — g ;„"„■)■) + 1) corresponds to the number of possibilities of choice of binary 
structure (which has less then n — ^ ^^"^^■^ inner nodes) . 

• (3 "" ^ ) is an asymptotic upper bound the number of possible distributions of 
unary nodes within binary structure. 

corresponds to the possibilities of variable bindings. Indeed, 
is an upper bound for the number of lambdas above a variable and 



(m")) 



3n 
ln(n) 



1 — g \n(n) upper bound for the number of leafs. 



Now it is sufficient to observe that (^ "t^ ) is subexponential. (The replacement 
of 12" by (12 + e)" compensates all factors smaller than exponential.) 

□ 



6.4 Comparison between the lower bound and the upper 
bound 

In the ratio between our lower and upper bounds, the dominant factor is exponential. 
This means that we are far from having an equivalent, but still this is not too bad 
because L„ is super-exponential. 

The following corollary shows that we know the two first terms of the asymptotic 
expansion of ln(L„), but we do not know the linear factor yet. 

Corollary 12. For all e > and for n large enough 

ln(4 - e) - 1 < ^-^^ihll _ in(n) + ln(ln(n)) < ln(12 + e) - - 
n 3 

6.5 Bounds on the unary height of a term 

Theorem 13. The set of terms with the unary height greater than ^ ^^"^^^ has density 
Proof. The same argument as in [5] (2) applies here. 

□ 



7 



7 Proofs using coding 



7.1 The number of A's in head position 

Theorem 14. Let g{n) G oi^^nj ln(n)) . The density of terms having less than 
g{n) head A 's is 0. 

Proof. Let us denote by An the set of typical terms of size n with less than g{n) 
head A's. We construct an injective, size-preserving function (coding) ip: A ^ K 
such that its image has density 0. 

Let t € An- We can write t — Xxi . . . Xxp.AI, where p < g{n) and M is a term 
starting with an application and containing at least one A (by Theorem I13p . Let 
B be the maximal purely applicative prefix of M i.e. B is the term using only 
application nodes and variables such that M = B[t] where terms in t start with 
A and variables in B are taken from the set {a;i, . . . , Xp\ (see Figure [T]). 




Figure 1: th. [TH the term t e An. 

Let us denote by A{n, p,h, t) the set of terms from An having, as in the de- 
composition oit above, p head A's, then a purely applicative context {i.e. a context 
without any lambda) of size 6, and, in that context, a sequence t of subterms 
beginning with A's. Because p < g{n) the cardinality of A(n,p, b, t ) is less than 

P{b, n) = Cib+l){g{n) + lf+\ 

Let t £ A(n,p, b, t ) where t = [ti, . . . ,tk]. By hypothesis on An, we have 
fc > 1. Let ti = Xzi.Ui. Let z be a fresh variable and u[ = Ui[zi :~ z]. Consider the 
term T = XzXxi . . . Xxp.{u[ (u'2 (. . . u'^.) . . .) which is of size n — b. Let Xy.C 

denote the term rooted at the leftmost deepest A of term T and let Y be the set of 
variables introduced by the A's occurring on the path from the the root to Ay. By 
Theorem [T51 there are at least „ , ", . elements in Y. 

' ' 3 ln(n) 

Let U be the set of purely applicative terms of size 6—1 whose variables are 
chosen from Y. For any u ^ U , let p{t, u) be the term obtained by substituting 
sub-term Xy.C in T with Xy.{u C). 

There are at least 

^(^'") = ^(^-i) (a^))' 

elements in U . Since for n large enough we have P(5, n) < Q{b, n) (because the 
limit of the quotient is 0), there exists an injective function h which assigns to any 
purely applicative prefix B of size b an element from U. Let ip{t) — p{t, h{B)) where 
B is the purely applicative prefix in the decomposition of t (see Figure [2]) . By the 
injectivity of h, we get that Lp is injective, too. 



8 



We also define 'l'(t) — {p{t, u) : u £ U}. Note that for t e A{n, p,b, t) the car- 
dinal of is always (3(6, n). Due to the construction, the sets '^{t) and "^{t') are 
disjoint for any pair of distinct terms t and t' . 




Figure 2: th. [TH the term ^p{t). 

Let us denote by '!/'(6, n) ~ q^lp^^n) ■ By the assumption on g there is a function e 
such that e(n) tends to and ^(6, n) = |^(i^)^e('^)- For h > 2, {j^)^ 
is decreasing in 6, so '^b^^l^j is bounded. Thus, tj^^b, n) tends to uniformly in b. 
Since the A{n,p, b, t ) form a partition of An, the result follows. □ 

7.2 Head A's bind "many" occurrences 

Theorem 15. Let g{n) G 0(1/ n/ In(ri)) . T/ie density of terms in which there is at 
least one A among g{n) head A's that does not bind any variable is 0. 

Proof. Let g{n) g o(^y/n/ ln(n)) and denote by Ty the set of random terms for which 
there exists at least one A among first g{n) head A's that does not bind any variable, 
and let = 7^ n A„. We construct a coding function (p: 7^ ~> A such that the 
density of its image is 0. 

Let T ~ Xxi . . . Xg(^n)-A be a term from 7^ and let i be the smallest integer such 
that the z-th head A in T does not bind any variable. Take 

ip{T) = Xxi . . . X^-iX^+i.(xi+i {XXi+2 ■ ■ ■ Xg(^n)-A)). 

The size of (p{T) is n. Terms from the set (p{T^) have less than g{n) head A's, so, 
by Theorem [HI the density of them in the set A„ is zero. Since the function (p is 
injective, the density of is also zero. □ 

Theorem 16. Let g{n) e o( ln(7i)) . The density of terms in which the total number 
of occurrences of variables bound by the first three A's is at most g{n) is 0. 

Proof. Let g{n) e o( ln(n)) and denote by Tn^g{n) the set of random terms of size n 
in which the total number of occurrences of variables bound by first three A's is at 
most g{n). We construct a coding functions ipn ■ Tn^g(n) ~^ such that the density 
of the union of images of all functions in A is zero. 



9 



Let us define an equivalence relation ~„ on the set of random terms of size n in 
the following way: M ~„ iV iff M and N are equal after substituting all occurrences 
of variables bound by first three A's by the variable bound by the first A. Let us 
denote by [M] the equivalence class of M . 

Let T = \xi\x2\x^.A be a term from T„ g(„). There are at most 3^*-"'' elements 
in the class [T]. 

\jCtT' = \xy.A[xi :— X2 '■— y, :— y\. The size of T' is n— 1. Let us consider 
Xa.U the sub-term of T' such that Xa is the leftmost deepest A in T' . Denote by 
B{T) the set of variables bound by A's occurring in T' on the path from Xa to Ay. 
Note that the variable x does not occur neither in T' nor in B{T). By Theorem [T^ 
there are at least , , , — 3 such A's. Since 3 < „ , s , there are at least 



3 \n[n) — 6 ln(n) ' 6 ln(n) 

elements in B{T). As g{n) £ o(ln(n)), we have 

39(") 

lim 



n— >oo ( 21 — _ ) 

V61n(n)y 

Thus, we can find for each class [T] an injective function Ht from [T] into the set 
B{T). 

We define <p(T) as the term obtained from T' by replacing the sub-term Xa.U 
with Aa.((y B) U), where B = h[T]{T). 

All terms from the image ^p{Tn.g(n)) start with a A that binds no variable. By 
Theorem [151 we know that the set of such terms have density zero in A„. Since / is 
injective, the density of UneN ^(n) zero, as well. □ 

Theorem 17. For any fixed integers k and k' , the density of terms in which each 
of the first k X 's binds more than k' variables is 1. 



Proof. Let us fix integers fc, k' and let g{n) — •\/ln(n). We assume that fc > 3. By 
Theorem [Tni the total number of occurrences of variables bound by first k A's in a 
random term of size n is more than g{n). 

For each n and q > g{n) let A{n, q) be the set of typical terms of size n having 
exactly q leaves bound by the first k lambdas and let B{n, q) be the set of terms in 
A{n, q) for which one of the first k A's binds at most k' variables. 

Consider the equivalence relation defined analogously to the relation from 
the proof of Theorem \W[ but with respect to the first k (instead of three) head 
A's. For T e A{n, q) the cardinality of [T] n A{n, q) is fc^ and the cardinality of 
[T] n B{n, q) is at most k ■ k' ■ q'' • (fc — 1)''^'"' and thus the quotient is less than 

Jp{q) = ^-l^S. — p— !^ which, since ip is eventually decreasing, is less than ip(g{n)). 

Since the [T] nA(n, q) give a partition of A{q, n) and the A{n, q) give a partition 
of the set of typical terms of size n and since tp{g{n)) has limit when n tends to 
oo this finishes the proof. □ 



7.3 The width of a term 

Let us recall that lambda width of a term is the maximal number of incomparable 
binding lambdas in the term. In the following proposition we show that lambda 
width tends to be very low for typical lambda terms. 

Theorem 18. The density of terms having X-width at most 2 is 1. 

Proof. Let us denote by W the set of terms with A-width greater than 2. As usual 
we put Wn = n A„. We show that there exists an injective, size preserving 
function ip:W^K such that its image has density 0. Let t be an element of 
Wn and let us denote by Aa;, Ay and Xz the three highest, pairwise incomparable 
binding A's (appearing in this order from left to right in t). 
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X X y y z z y y ay 



Figure 3: th. [HI the terms t and ^p{t). 

Let Xx.A, Xy.B and Xz.C be sub-terms rooted at those A's (see Figure [3]). Let 
A' — A[x :— y], let a be a new variable, let C" be the term obtained from C by 
replacing the leftmost occurrence of z with a and the others (possibly none) with y. 
Let ip{t) be the term obtained from t by adding Xa at the root, substituting both 
sub-terms Xx.A and Xz.C with a and replacing the leftmost occurrence of y in i? 
with term {A' C). We have size{(p{t)) = size{t). Also note that since we chose the 
highest three incomparable A's no variable becomes free in the constructed term. 
The injectivity of (p comes from the fact that both Ay and the sub-term {A' C) of 
ip{t) are uniquely identifiable (see Figure 

• Let vi (resp. Vr) be the deepest node above the two left-most (resp. right- 
most) occurrences of a. Remark that since there is exactly 3 occurrences of 
a, one of these two nodes is above the other. Let v be the deepest one. Ay is 
the first binding A on the path from the node v to the middle occurrence of a; 

• then, the application node {A' C) is the deepest node above the middle 
occurrence of a and all the occurrences of y on the left of this middle occurrence 
of a. 

Since the image of contains only terms starting with a A which binds only 3 
occurrences of the corresponding variable, by Theorem [TTl the density of ip{Wn) is 
equal to zero. The injectivity of Lp finishes the proof. □ 

7.4 A random term avoids any fixed closed term 

Definition 19. Let to be a term. We denote by A*" the set of terms having to as 
a sub-term and by A^° the set A*" H A„ 

Theorem 20. Let to be a term of size k' with k occurrences of free variables. 
Assume k' > k + 1. Then the density o/ A*° is 0. 

Proof. We construct a size preserving coding ip: A*^" A such that its image is of 
density 0. 

There are at most k' — k + 1 occurrences of A's and at most k' + 1 leaves in to, 
so there are at most 

K ^ M{k'){k' + lf+^ 

such terms and we can enumerate them in a fixed way. Let m be the number of 
to. The tree contains at least one occurrence of A, since otherwise we would have 
k' < k. Let g G 0( 3 !,"(„) ) be such that g{n) "^°°> 00. Let n be an integer satisfying 
gin) > K. 
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Let t e A^o be a random term. By Theorem [HI the term t has more than m 
head A's since m < K (see Figure EJ. 




Figure 4: th. [20l the terms t G A^" and T'. 

Let us consider the term T which is obtained from the term t by adding an 
additional unary node (labelled with Ax) at depth m. Let us define (p{t) as the 
term T' obtained by replacing the left-most deepest sub-term to in T by the term 
ti = (U B) of size k' — 1 (see Figure |4|), where J7 is a binary tree such that 
U = {x {x {. . . {x x) . . .))) and B — (xi {x2 (• • ■ {xk~i Xk) ■ ■ •))) (in case where 
to has no free variables we put ti = U) . Thus, the size of T' is equal to n. The 
variable x is bound by the m-th A in the tree T' . Since m is the number of the tree 
to, the function tp is injective. 

By Theorem II 71 each of K head A's in a random tree of size n binds more than 
k' variables. Trees from the image /(A„ n A*" ) do not have this property, since the 
TO-th A binds only k' variables. Thus, those trees are negligible among all trees of 
size n. □ 

Corollary 21. Let to be a term. If to is closed or if there are at least two A's in 
to, the density of A*'" is 0. 

Proof. These are special cases of the previous theorem. □ 

7.5 The density of strongly normalizable terms 

From theorem[T8l we know that almost all terms are of width at most 2. In this sec- 
tion, we introduce a notion of 'safeness' for terms of width 2 with the two following 
properties: 

• safeness and width at most 2 implies strong normalisation (proposition [53]); 

• the set of unsafe terms of width 2 has density (proposition [50)1 . 

The first part of this section is devoted to proposition!^ and is pure A-calculus. 
We tried to write the proofs to be accessible for non specialist in A-calculus. Nev- 
ertheless, For the basic fact [T] below, see [I] and for similar proofs techniques, see 

mm)- 

Definition 22 (fair and safe terms). 
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1. Let t he a term of width 1. We say that t is fair if there is no binding A on 
the left branch oft (this includes the root node oft). 

2. Let t be a term of width 2 and let {u v) be the smallest sub-term of t of width 
2. By definition, u and v have width 1. We say that t is safe if at least one 
of the term u or v is fair. 

Proposition 23. Let t be a safe term of width at most 2, then t e SN. 

Definitions and notation 1. 

• Let t be a term. If t is a term we denote by rj{t) the length of the longest 
reduction starting from t and +oo if t is not SN. 

• Let a be a substitution (that is a partial map from variables to terms). We 
write t[a] the capture free application of the substitution to t. 

• A context, is a, X-term with a unique hole denoted []. Traditionally context 
are defined by a BNF. If E is an arbitrary context, it is given by the following 
BNF: 

-E := [] I Xx.E \ [E A) \ (A E) where A denote arbitrary terms. 



• When E is a context and t is a term, E[t] denotes the replacement of the hole 
in E by t allowing capture: the X's in E can bind variables in t. 

• For a context E, r]{E) = t]{E[x]) and size(-E) = size(-E[a;]) where x is an 
arbitrary variable not captured by E. 

Fact 1 (Basic fact on A-terms and strong normalisation). For some proofs in this 
section, we use the fact that a X-term can be written in one of the following forms: 

• t = {xti ... tn) with n> in which case r]{t) = r]{ti) + • • • + r]{tn) and t is 
SN if and only if ti, . . . ,tn are SN. 

• t = Xx.u in which case r]{t) = r]{u) and t is SN if and only if u is SN. 

• t = {(Xx.u) V ti ... tn) with n > in which case 7j{t) < r](u[x := v]ti ... tn) 
and t is SN if and only if {u[x := v]ti ... tn) is SN. 

Moreover, if t is a term and x is a variable, then t is SN if and only (t x) is SN. 
This can be shown by induction on the size of t using the above case analysis. 

Lemma 24. 

1. The set of terms of width (resp. of width at most 1) is closed by reduction. 

2. If t is a term of width at most 1 then t £ SN. 

Proof. (1) for width is easy because substitution and reduction can not bind 
variables and width means that all variables are free. For width 1, we first remark 
that width 1 means that all binding A's occur on the same branch. We consider 
such a term t and a /3-rcduction: t = E[{Xx.u) v] > E[u[x := v]] = t' . There are 
two cases: either x is not bound in u and t' = E[u] or it is bound in u and v must 
have width which means that all the free variables of v are free or bound by the 
context. In both cases, it is clear the t' is still of width 1 because the binding A's 
remain on one branch. 

(2) follows from the fact that a reduction decreases the pair {Ni{t), No(t)) for the 
lexicographic ordering, where Ni(t) (resp. No{t)) is the number of binding (resp. 
non binding) A's. To prove this, We consider again such a term t and a /3-reduction: 
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t — E[(Xx.u) v] > E[u[x := v]] = t'. If x is not bound in u, Ni{t) is non increasing 
(it is decreasing if v contains some binding A's) and No{t) is decreasing (we erase 
at least one non binding A). If a; is bound in m, we know that v is of width and 
contain no binding A, which means that we create no binding A and therefore Ni{t) 
is decreasing. □ 

Lemma 25. Ifu has width andti, . . . , t„ are SN terms then the term {uti ... i„) 
is SN. 

Proof. By induction on the size of u, we distinguish three cases: 

• li u ^ X, the result is trivial by the fact[T] 

• If u — (u' v), because of lemma v is SN and we conclude by induction on 
u'. 

• li u = Xx.u' , we use lemma [Ml if n = and we get that {u' t2 . ■ ■ tn) is SN 
by induction otherwise. □ 

Lemma 26. Let t G SN be a term and a he a substitution such that, for each x, 
there is k such that a{x) — {uvi ... Vk) where u has width and vi . . . Vk are SN. 
Then, t[a] £ SN. 

Proof. By induction on (ri{t), size(t)). We consider the following cases: 

• If t = (x ti ... tn) and x is not in the domain of ct or < = Xx.ti. In this case, 
it is enough to prove that for all i, ti[a] is SN. This follows from the induction 
hypothesis because ri{ti) < rj{t) and size(ii) < size(t). 

• lit = {{Xx.u) vti . . . tn) we have to show that {u[x := v\ti ... t„)[cr] is SN 
which follows from the induction hypothesis because rj{u[x :— v]ti ... i„) < 
ri{t)- 

• lit = {xti ... tn) and x is in the domain of a. Then, t[a\ = (cr(x) ii [a] ... t„[cr]) 
which is SN by lemma because ti[a],. . . ,t„[(T] are SN by induction hypoth- 
esis and cf{x) = {uvi ... Vk) where u has width and v\ . . . Vk are SN. □ 

Deflnition 27. We define the set of context of width 1 by the following BNF (where 
Aq denotes the set of X-terms of width 0): 

E:=[]\ Xx.E I {E Ao) | (Aq E) 

This definition means that all the binding A's are on the path from the root to 
the hole of the context. This justifies the name of such a context. 

Lemma 28. Let E he a context of width 1 and u G SN be a term. Then E[u] € SN 

Proof. By induction on size(£'). The cases E — W or E — Xx.Ei are trivial (by 
induction in the second case because size(£^i) < size(£^)). 

li E = {El v) where v S Aq, then E[u] — {Ei[u] x)[x :— v] where x is a 
fresh variable and Ei[u] is SN by induction hypothesis because size(£'i) < size(i?). 
Therefore (£'i[u] x) is SN by fact [T] and finally {Ei[u] x)[x :— v] is SN by Lemma 

m 

The case E = {v Ei) is symmetric. □ 
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Proof of proposition If t has width at most one, this is lemma If t has width 
2, let {ti t2) be the smallest sub-term of t of width 2. This means that t can be 
written E[ti t2\ where E is a, context of width 1 and ti and t2 have width 1. By 
lemma it is therefore enough to show that {ti is SN. 

We know that t is safe. This means that either ti or t2 is fair. If t, is fair, it can 
be written F[u v\ where u has width 0, v has width 1 and _F is a context using the 
following BNF: 

F := [] I A_.F I {F Aq) where A_. denotes non binding A's and Aq terms of width 0. 

The context F is defined precisely to denote the beginning of the left branch 
until we reach an application node whose argument is of width 1 . The definition of 
fair term of width 1 ensures the existence of such an application node on the left 
branch. 

This means that (ii 12) can be written {F[u v] 12) (resp. {ti F[u v])). Let us 
define t' = {F[x] tz) (resp. t' = {h F[x])). 

Therefore in both cases, t = t'[x :— (uw)], because the context F can not bind 
variables. Thus, we can conclude by lemma because u has width and because 
t' and V are SN (by lemma since they have width 1). □ 

To end this section and the proof of the main result (corollary [31]), we establish 
a density result about unsafe terms of width 2. 

Lemma 29. The density of the set A of terms containing two consecutive non- 
binding X is 0. 

Proof. We define an injective and size-preserving coding from A to the set of terms 
whose leading A binds only once and the proof follows from theorem 1171 The 
coding is as follows: in a term t € A oi size n, we replace the subterm ti rooted at 
an occurence of 2 non-binding lambdas, ti — Xa.Xb.u, by the term x u where x is a 
fresh variable. We get a term t' of size n — 1 and the final coded term is Xx.t' which 
is of size n. □ 

Proposition 30. The set of unsafe lambda terms with lambda width 2 has density 
0. 

Proof. For every such a term, the root of the minimal subterm of width 2 is called 
the branching node and is always binary. 

Let us divide the set of unsafe terms of width 2 into to parts: 

Si. the set of terms such that both the lengths of paths from the branching node 
to the two highest independent binding lambdas is not greater then ln(n) 

82'. the set of remaining imsafe terms with lambda width 2. 

The set Si can be encoded to the set of all terms in the following two steps. First, 
remove two highest independent binding lambdas and put one lambda, binding 
their variables, at the root of the whole term. The resulting size is smaller by 1 
and the branching node is uniquely determined. Second, insert one lambda that 
binds nothing between the head lambdas of a term. According to Theorem 1141 we 
have more then In(ri)^ head lambdas. Therefore we can encode the lengths of the 
paths from the branching node to the two highest binding lambdas by the position 
of this new lambda. Theorem [15] grants that the image of such transformation have 
density 0. 

For the set S2 proceed as follows: First, choose the path that is longer than 
ln(n). Let to be the subterm rooted at the binding lambda at the end of this path 
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(we assume it is the left path, the case of the right one is analogous). By lemma 
we can suppose that at least half of the nodes on this path are binary. Let <i, . . . , ifc 
be the right subtrees of the consecutive binary nodes on the path (the path goes 
always to the left since the term is unsafe). Second, chose some leaf v belonging to 
some subtree ti, . . . ,tk and exchange it with the subterm Iq . Independently of a 
choice of the leaf, the encoding can be reversed since the position of is uniquely 
identifiable as a highest binding lambda of the fair subtree of the branching node. 
The encoding is size preserving and the number of possibilities for the choice of a 
leaf V exceeds ln(n)/2 therefore S2 has density 0. □ 

Corollary 31. The set of strongly normalizable terms has density 1. 

Proof. First, by theorem [TSl we can focus on terms of width at most 2. Proposi- 
tion [301 shows in addition that we can restrict to the following types of terms: 

• terms of width at most 1, 

• safe terms of width 2. 

Proposition [23] shows that they are all strongly normalizable. □ 

8 Combinatory logic 

Definition 32. 

1. The set C of combinators is defined by the following grammar 

C -.^ K \ S \ I \ {C C) 

2. The size of a combinator is defined by the following rules: size{S) — size{K) = 
size{I) — 1 and size{{u v)) = size{u) + size{v). 

3. The reduction on combinators is the closure by contexts of the following rules. 
{K u v) t> u {S u V w) t> (u w {v w)) {I u) l> u 

Remark: It is easy to see that the number of internal nodes in a binary tree 
represented by a combinator is smaller by 1 than its size. Therefore, all the results 
concerning densities would be the same if we had defined the size as a number of 
internal nodes (like we have for A-terms). 

Proposition 33. 

1. The generating function f enumerating the set of combinators is f{z) = 

2 

2. The generating function ft„ enumerating set of all combinators having to as 
a sub-term is ft„{z) - + Vi-i2.+4."o ^ 

Proof. 

1. The function / thus satisfies 

/(z) = 3z + /(z)2. 

Solving the equation and choosing between the two possibilities (/(O) = 0) 
gives the solution. 
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2. Assume that hq — size(to). Using the fact that every combinator t having 
to as a sub-term is either to or has the form t ~ {ti ^2) where either to is a 
sub-term of ti but not of ^2 or to is sub-term of ^2 but not of ti or finally 
is sub-term of both ti and ^2 we get the following equation. 

f,,{z) = + 2/,„(z) (/(z) - ^(z)) -f {UXz)f 



which can be simplified to 

^(z)=z"°+2./i„(z)./(z)-(^(z))2 

Solving the quadratic equation with unknown function ftg and choosing be- 
tween the two possibilities {ftg (0) = 0) gives the solution. 

□ 

In the next theorem, the symbol represents the coefficient of z" in the 

series expansion of the generating function F. 

Theorem 34. Let v, w be functions satisfying the following hypotheses: 

• v,w are analytic in \z\ < 1 with z — 1 being the only singularity at the circle 
\z\ = l. 

• f (2), w{z) in the vicinity of z = 1 have expansions of the form 

p>0 p>0 



Let V and w be defined by v{\/l — z) = v{z) and w(\/l — z) = w{z). Then 

^ (^)'(O) 

n^oo [z"]{w(z)} (w)'(O)' 

Proof. This is a standard result in the theory of generating functions. For example, 
see [n]. □ 

Theorem 35. Let to be a combinator. The density of combinators having to as a 
sub-term is 1. 

Proof. The proof uses standard tools on generating functions. It follows from Propo- 
sition [33] and Theorem 1341 below and some easy computations. 

In order to satisfy assumptions of Theorem [33] we normalize functions in such a 
way to have the closest to the origin singularity located in \z\ < 1 at the position 
in z = 1. So, we define functions fto{z) = /to(z/12) and f{z) — /(z/12). Therefore 
we have: 



\"0 



1 1 



m = 2-2^ 

This representation reveals that the closest singularity of fto{z) and f{z) located in 
\z\ < 1 is indeed z = 1. We have to remember that change of a caliber of the radius 
of convergence for functions ftg and / effects accordingly sequences represented 
by the new functions. Therefore those new functions enumerate two sequences 
(12)" ([z"]{/tj(2)) and (12)" ([z"]{/(2)}). Now let us define functions / and ^ 
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so as to satisfy the following equations: — z) = f{z) and ftoW^ ^ z) — fto{z) 

Functions / and fto are defined in the following way: 



The derivatives {fto)' and (/)' are the following 

1 (2^-81 — 



(/to)'W = 2 



no 

2 z - 8 ( ) no • z 



Finally derivatives (/t(,)'(0) = — ^ and (/)'(0) = — i. To conclude the proof we 
use accordingly Theorem [Ml so: lim„^oo %l\{f[z)} = limn--oo '(i2)iy|//(i)|^ = 

(Ag)'(O) ^ -, 

(/)'(0) 

□ 

Theorem 36. The density of non strongly normalizing comhinators is 1. 

Proof. Let ft ~ {S I I {S I I)). Then fl reduces to itself and is thus not strongly 
normalizing. The theorem is thus an immediate consequence of the theorem 1351 □ 



9 Discussion 

9.1 Other notions of size 

The difference between Theorem [21] in the A-calcuhis and Theorem [35] in combi- 
natory logic may be surprising since there are translations between these systems 
which respect many properties (including strong normalization). However, these 
translations do not preserve the size. 

The usual translation, which we denote by Ti, from combinatory logic to A- 
calculus is linear, i.e. there is a constant k such that, for all terms, size(Ti{t)) < 
k * size(t). Note that this translation is far from being surjective: its image has 
density 0. Moreover, the usual translation T2 in the other direction (see [IJ) is not 
homogeneous: linear for some terms and non-linear for others. The point is that T2 
has to code the variable binding in some way and this takes place. 

The difference between the two theorems comes probably from the definition 
of size that we have used for the variables in the A-calculus. The usual way to 
implement coding of variables is to replace the names of variables by their de Bruijn 
indices: a variable is replaced by the number of A's that occur, on the path from 
the variable to the A that binds it. Note that, in this case, different occurrences of 
the same variable may be represented by different indices. 

Choosing the way in which we code de Bruijn indices gives different ways of 
defining the size of a term. This can be done in the following ways: 

- using unary notation, i.e. the size of the index n is simply n itself; 

- using binary notation, i.e. the size of the index n is \log2{n)~\ , i.e. the logarithm 
of n in base 2. 
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9.2 Some experiments 



Although the results we proved concern only the model where the size of a variable 
is 0, we did some experiments on the other models. There is an easy algorithm 
(polynomial time in n) to compute L„ for each model of size. This algorithm can 
be sometimes adapted to compute (still in polynomial time) the number of terms 
of size n having a given property P. We did this for several simple syntactical 
properties until size 1000. It is always a strange exercise to guess the limit of a 
sequence from its first values, but our results, at least, suggest the following: 

- Almost all terms start with several A's for model with constant size variables 
(99.99% start with at least one A for size 1000), whereas it is not clear that terms 
starting with an application are negligible for other models; 

- Identity almost always (exceptions represent a fraction of terms less than 10~^ 
for size 500) occur for models with non-constant size of variables, whereas at least 
80% of terms don't contain identity for model with variables of constant size (for 
variables of size 0, we now that it goes toward 100%). 

9.3 Future work and open questions 

We give here some questions for which it will be desirable to have an answer. 

- Give an asymptotic equivalence for L„ or, at least, better upper and lower 
bounds. 

- Give the density of typable terms. Numerical experiments done by Jue Wang 
(see [T^) seem to show that this density is 0. 

- Compute the densities of strongly normalizing terms with other notions of size 
(mainly by changing the size of variables, and eventually making it non caonstant). 
If we can not simplify the present proof of density 1 of SN terms (corollary [3T|) . it 
seems very difficult to extend this result if only for variables having size 1: most 
encoding techniques really use the fact that variables have size 0. However, we 
believe that proving theorem [ij] is an achievable goal for variables of size 1 . 
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